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The analysis of data sometimes requires fitting many free parameters in a theory to a large number 
of data points. Questions naturally arise about the compatibility of specific subsets of the data, 
such as those from a particular experiment or those based on a particular technique, with the rest of 
the data. Questions also arise about which theory parameters are determined by specific subsets of 
the data. I present a method to answer both of these kinds of questions. The method is illustrated 
by applications to recent work on measuring parton distribution functions. 



1. INTRODUCTION 

There are many situations where data from a vari- 
ety of different experiments must be fitted to a sin- 
gle underlying theory that has many free parameters. 
The particular instance that led to this work is the 
measurement of parton distribution functions (PDFs). 
which describe momentum distributions of quarks and 
gluons in the proton [J, i, i, i, i] . 

In these situations, it would be desirable to assess 
the consistency between the full body of data and indi- 
vidual subsets of it, such as data from a particular ex- 
periment, or data that rely on a particular technique, 
or data in which a particular kind of theoretical or 
experimental systematic error is suspected. It would 
also be desirable to characterize which parameters in 
the fit are determined by particular components of the 
input data. This paper presents a "Data Set Diagonal- 
ization" (DSD) procedure that answers both of those 
desires. 



NEW EIGENVECTOR METHODS 



certain "tolerance criterion" Ax^ above its minimum 
value. If the errors in the data are random and Gaus- 
sian with standard deviations truly given by Ei, and 
the theory is without error, the appropriate Ax^ can 
be related to confidence intervals by standard statisti- 
cal methods. Those premises do not hold in the appli- 
cation of interest here; but the tolerance range can be 
estimated by examining the stability of the fit in re- 
sponse to applying different weights to subsets of the 
data [J, 0,01 • 

Sufficiently close to its minimum, is an ap- 
proximately quadratic function of the parameters 
oi, . . . , ajv- Using the eigenvectors of the matrix that 
defines that quadratic form as basis vectors in the iV- 
dimensional parameter space, one can define new the- 
ory parameters zi, . . . , zpf which are linear combina- 
tions of the original ones 
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and which transform into the very simple form 
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The quality of the fit of a theory to a set of data is 
measured by a quantity x^, which in simplest form is 
given by 



z— 1 ^ 
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where Di and Ei represent a data point and its uncer- 
tainty, and Ti is the theoretical prediction. (Although 
([1]) is standard practice, some alternatives might be 
worth consideration 0].) 

The predictions Ti in Eq. ([1]) depend on a number of 
parameters ui, . . . , qn- The best- fit estimate for those 
parameters is obtained by adjusting them to minimize 
X^- The uncertainty range is estimated as the neigh- 
borhood of the minimum in which x^ li^s within a 



Formally, the transformation matrix W can be com- 
puted by evaluating the Hessian matrix x^ jdai daj 
at the minimum using finite differences, and comput- 
ing its eigenvectors. The new parameters Zi are then 
just coefficients that multiply those eigenvectors when 
the original coordinates ai,...,a7v are expressed as 
linear combinations of them. In the PDF applica- 
tion, this straightforward procedure breaks down be- 
cause the eigenvalues of the Hessian span a huge range 
of magnitudes, which makes non-quadratic behavior 
complicate the finite-difference method at very differ- 
ent scales for different directions in parameter space. 
However, this difficulty can be overcome by an itera- 
tive technique [l| that is reviewed in the Appendix. 

The linear transformation ([2]) that leads to ^ is not 
unique, since any further orthogonal transform of the 
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coordinates Zi will preserve that form. Such an orthog- 
onal transformation can be defined using the eigenvec- 
tors of any symmetric matrix. After this second linear 
transformation of the coordinates, the chosen symmet- 
ric matrix will be diagonal together with x^- The sec- 
ond transformation can be combined with the first to 
yield a single overall linear transformation of the form 
Thus there is a freedom to diagonalizc an addi- 
tional symmetric matrix while maintaining the simple 
form © for ■ 

That symmetric matrix can be taken from the ma- 
trix of second derivatives that appears when the vari- 
ation of any function of the fitting parameters is ex- 
panded in Taylor series through second order. Thus it 
is possible within the quadratic approximation to diag- 
onalize any one chosen function of the fitting parame- 
ters, while maintaining the diagonal form for ■ An 
explicit recipe for this "rediagonalization" procedure 
is given in the Appendix. 

The freedom to diagonalizc an additional quantity 
along with can be exploited in several ways: 

1. The traditional approach in which one only di- 
agonalizes the Hessian matrix is formally equiv- 
alent to also diagonalizing the displacement dis- 
tance D from the minimum point in the space of 
the original fitting parameters: 
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In this approach, the final eigenvectors can 
usefully be ordered by their eigenvalues, from 
"steep" directions in which rises rapidly with 
D, to "flat" directions in which varies very 
slowly with D. This option has been used in the 
iterative method that was developed for previous 
CTEQ PDF error analyses [1]. 

2. One can diagonalize the contribution to from 
any chosen subset S of the data. This option 
is the basis of the DSD procedure, which is de- 
scribed in the next Section and applied in the 
rest of the paper. 

3. One can diagonalize some quantity G that is of 
particular theoretical interest, such as the pre- 
diction for some unmeasured quantity. In this 
way, one might find that a small subset of the 
eigenvectors is responsible for most of the range 
of possibilities for that prediction, which would 
simplify the application of the Hessian method. 
An example of this was given in a recent PDF 
study However, there is no guarantee in gen- 
eral that the diagonal form will be dominated by 



a few directions with large coefficients {Pi and/or 
7i in Eq. ([28]) of the Appendix). Hence a better 
scheme to reduce the number of important eigen- 
vectors might well be to simply choose the new 
zi along the gradient direction dG/dzi, and then 
to choose the new Z2 along the orthogonal direc- 
tion that carries the largest residual variation, 
etc. 



3. THE DSD METHOD 

Let us diagonalize the contribution x| from some 
chosen subset S of the data. That puts its contribution 
to the total x^ into a diagonal form 
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while preserving ([3]), as is derived in the Appendix. 
The contribution xj- = ~ Xs from the remainder of 
the data S is then similarly diagonal. 

If the parameters 7^ all lie in the range < 7^ < 1, 
Eqs. ([3]) and ^ can be written in the form 

2 2,2 
X = Xs + Xs 



xl = const -t- 



Zi Ai 
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These equations have an obvious interpretation that is 
the basis of the DSD method: In the new coordinates, 
the subset S of the data and its complement S take the 
form of independent measurements of the N variables 
Zi in the quadratic approximation. The results from 
Eq. ^ can be read as 



where 



Zi = Ai zL Bi according to S 
Zi = Gi ± Di according to S 



A^ = -(i^h^, B, = 1/^7^ 



(7) 
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Eqs. (I7|)-(l8|) provide a direct assessment of the com- 
patibility between the subset S and the rest of the data 
S . For if Gaussian statistics can be used to combine 
the uncertainties in quadrature, the difference between 



3 



the two measurements of Zi is 



-P^ . 1 



± 



7^(1 -7*) A/7«(l-7i) 



(9) 



This leads to a chi-squared measure of the overall dif- 
ference between S and S along direction zf. 
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(The symmetry of (jlOp under the interchange 7^ 
1 — 7i reflects the obvious symmetry S ^ S.) Even 
in applications where Gaussian statistics cannot be as- 
sumed, the variables Zi are natural quantities for test- 
ing the compatibility of S with the rest of the data. 

Eqs. ©-([S]) also directly answer the question "What 
is measured by the subset S of data?". For, provided 
S is compatible with its complement, the variables 
Zi that are significantly measured by S are those for 
which the uncertainty Bi from S is less than or com- 
parable to the uncertainty Di from S. 
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S similarly consists of N-g measurements. The ratio of 
uncertainties is then given by 




(11) 



Thus "fi can be interpreted as the fraction of the data 
that is contained in subset S, for the purpose of mea- 
suring Zi. 

In applications of the DSD method, it is likely that 
not all of the 7^ parameters will lie in the range 
< 7i < 1. For if 7i > 1, then S dominates the 
measurement of Zi, so S is quite insensitive to Zi, so 
the dependence of on Zi is likely not to be described 
well by a quadratic approximation. Similarly 7; ^ 
means that S dominates the measurement of z^, so 
the small dependence of Xs on Zi may not be very 
quadratic. 

Compatibility between S and S along directions for 
which 7i > 0.8 or 7^ < 0.2 is not a crucial issue, since 
one or the other measurement dominates the average 
along such directions. It is an important feature of 
the DSD method that it distinguishes between incon- 
sistencies that do or do not affect the overall fit. In 
that sense, it is a more sensitive tool than the previ- 
ous method of simply studying x| vs. x^ by means of 
a variable weight 



APPLICATIONS TO PARTON 
DISTRIBUTION ANALYSIS 



TABLE I: Ratio between Bi = uncertainty from S and Di 
= uncertainty from S, for various 7i . 

For purposes of orientation, the relationship be- 
tween 7i and the ratio of uncertainties Bi/Di = 
■\/(l — 'yi)/ji is shown in Table [J for some values of 
7i that correspond to simple ratios. In particular, 
7i = 0.5 means that S and S contribute equally to 
the measurement of z^; while 7^ = 0.9 means that the 
uncertainty from S is three times smaller than from 
S; and ji = 0.1 means that the uncertainty from S is 
three times larger than from S. Practically speaking, 
one can say that S dominates the measurement of Zi 
if 7i ^ 0.8 — 0.9, while the complementary set S dom- 
inates if 7i < 0.1 — 0.2. Beyond those ranges, the con- 
tribution from the less-important quantity is strongly 
suppressed when the weighted average is taken. 

Another way to interpret the 7^ parameter is as fol- 
lows. Pretend that S consists of Ng repeated measure- 
ments of Zi , each having the same precision; and that 



The interpretation of data from high energy collid- 
ers such as the Tevatron at Fermilab and the LHC 
at CERN relies on knowing the PDFs that describe 
momentum distributions of quarks and gluons in the 
proton. These PDFs are extracted by a "global analy- 
sis" [1, m of many kinds of experiments whose results 
are tied together by the theory of Quantum Chromo- 
dynamics (QCD). The analysis described here to illus- 
trate the DSD method is based on 36 data sets with 
a total of 2959 data points. These are the same data 
sets used in a recent PDF analysis [1] , except that two 
older inclusive jet experiments have been dropped for 
simplicity. 

The theory uses the same 24 free parameters as that 
recent analysis. These parameters describe the mo- 
mentum distributions u{x), d{x), u{x), d{x), s{x) and 
g{x) at a particular small QCD scale. All of the PDFs 
at higher scale can be calculated from these by QCD. 

This PDF application is a strong test of the new 
method, because the large number of experiments of 
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different types carries the possibility for unknown ex- 
perimental and theoretical systematic errors, and the 
large number of free parameters includes a wide range 
of flat and steep directions in parameter space. 



4.1. E605 experiment 

We first apply the data set diagonalization method 
to study the contribution of the E605 experiment 
to the PDF analysis. This experiment (lepton pair 
production in proton scattering on copper) is sensitive 
to the various flavors of quarks in the proton in a dif- 
ferent way from the majority of the data, so it can be 
expected to be responsible for one or more specific fea- 
tures of the global fit. It is also an experiment where 
unknown systematic errors might be present, since no 
corrections for possible nuclear target effects are in- 
cluded. 

There are 24 free parameters in the fit, and hence 
24 mutually orthogonal eigenvector directions. In de- 
scending order, the first 4 of these are found to have 
71 = 0.91, 72 = 0.38, 73 = 0.16, 74 = 0.06. All of 
the other eigenvectors have still smaller or even neg- 
ative 7i. Hence according to the previous discussion, 
the fit is controlled mainly by this E605 data set along 
eigenvector direction 1; E605 and its complement both 
play a role along direction 2; E605 plays a very minor 
role along direction 3; and it is unimportant along the 
remaining 21 directions. 

This is confirmed in Fig. [1] which shows the varia- 
tion of x^, with the best-fit values subtracted, for E605 
(119 data points) and its complement (the remaining 
2840 data points) along each of the first four direc- 
tions. Along direction 1, the E605 data indeed domi- 
nate the measurement: the "parabola" of x| is much 
narrower than the "parabola" of x|-. The minimum 
for the complementary data set S lies rather far from 
the best fit value zi = 0, but its x^ is so slowly varying 
that it is not inconsistent with that value. Along direc- 
tion 2, E605 and its complement arc both important, 
and the two measures are again seen to be consistent 
with each other. For the remaining 2 directions shown, 
and the 20 directions that are not shown, the S data 
completely dominate: E605 provides negligible infor- 
mation along those directions. (The Z4 curve for E605 
ends abruptly, because the fit becomes numerically un- 
physical at that point, which is far outside the region 
of acceptable fits to S.) 

The S and S columns of Table |ll] show the in- 
formation of Fig. [T] interpreted as measurements of 
zi, . . . , Z4 . This can be done according to Eqs. 
([8|, or more precisely by fitting each of the curves in 
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FIG. 1: x^ for fit to E605 (dashed curves) and to the rest 
of the data (solid curves) along the four leading eigenvector 
directions in descending order of 7i. In each panel, Zi — 
is the location of the overall best fit. 
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H 


Zi from S 


Zi from S 


Difference 


o-i 


1 


0.91 


-0.37 ± 1.07 


2.94 ±2.67 


-3.31 ±2.88 


1.15 


2 


0.38 


-1.38 ± 1.61 


0.87 ± 1.29 


-2.26 ±2.07 


1.09 


3 


0.16 


0.05 ± 2.45 


-0.01 ± 1.10 


0.06 ±2.68 


0.02 


4 


0.06 


1.57 ±3.92 


-0.10 ± 1.03 


1.67 ±4.05 


0.41 



TABLE II: Consistency beween S — E605 experiment and 
S = the remainder of data. 



Fig. [T] to a parabolic form in the neighborhood of its 
minimum rather than fitting at = 0. The Difference 
column is the difference between the S and S measure- 
ments of Zi, with an error estimate obtained by adding 
the S and S errors in quadrature. The final column ex- 
presses this difference in units of its uncertainty, which 
would be the number of standard deviations for Gaus- 
sian statistics. The fact that these numbers are ^ 1 
implies that the E605 experiment is consistent with 
the rest of the global analysis. 
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4.2. Inclusive jet experiments 

We now turn our attention to the role of the CDF 
and DO [ll| run II jet experiments in the PDF 
analysis. This was the principal subject of a recent 
paper [1]; but the DSD technique can shed new light 
on it. We first examine the consistency between each 
jet experiment and the rest of the data with the other 
jet experiment excluded. Results for the leading 7^ 
are shown in Table [m] for CDF and Table |TV] for DO. 
The CDF experiment plays a strong role along its two 
leading directions (71 = 0.75 and 72 = 0.62), show- 
ing a rather strong tension (3.6 ct) along zi- The DO 
experiment similarly plays a strong role along its two 
leading directions (71 — 0.71 and 72 = 0.52), but it is 
consistent with the non-jet data along both of those 
directions. 
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li 


Zi from S 


Zi from S 


Difference 




1 


0.75 


0.55 ± 1.11 


-1.74 ± 1.85 


2.28 ±2.15 


1.06 


2 


0.62 


2.66 ± 1.25 


-4.34 ± 1.52 


7.00 ± 1.96 


3.56 


3 


0.04 


11.26 ±4.14 


-0.58 ± 1.03 


11.84 ±4.26 


2.78 



TABLE III: Consistency between S = CDF and S = all 
non-jet data. 
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Zi from S 


Difference 




1 


0.71 


0.49 ± 1.11 


-1.33 ± 1.79 


1.82 ±2.11 


0.86 


2 


0.52 


1.05 ±1.36 


-1.26 ± 1.51 


2.31 ±2.03 


1.14 


3 


0.07 


-2.00 ± 3.89 


0.14 ± 1.03 


-2.14 ±4.02 


0.53 
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Zi from S 


Zi from S 


Difference 


Oi 


1 


0.82 


0.35 ± 1.08 


-1.68 ±2.31 


2.02 ±2.55 


0.79 


2 


0.74 


1.62 ± 1.15 


-4.60 ± 1.89 


6.23 ±2.21 


2.81 


3 


0.12 


-0.19 ±2.84 


0.03 ±1.07 


-0.21 ±3.04 


0.07 


4 


0.05 


3.14 ±4.34 


-0.16 ±0.97 


3.31 ±4.44 


0.74 



TABLE V: Consistency between S = CDF ± DO jet data 
and S = all non-jet data. 
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TABLE IV: Consistency between S = DO and S = all non- 
jet data. 



FIG. 2: for At to CDF±DO (dashed curves) and to the 
remaining data (solid curves), for the four leading direc- 
tions in descending order of 7;. 



Since these jet experiments measure the same pro- 
cess by similar techniques, it also makes sense to com- 
bine them into a single subset S. The result is given 
in Table |Vl The 7^ parameters in descending order 
are 71 = 0.82, 72 = 0.74, 73 = 0.12, 74 = 0.05, so 
these data supply most of the constraint along their 
two leading directions, and negligible constraint along 
any of the others. The expectation that these two 
experiments measure the same thing is confirmed by 
the fact that there are still only two directions being 
determined, with 71 and 72 larger than for either ex- 
periment alone. Some tension (2.8 cr) exists between 
S and S along Zi\ but combining the data sets has re- 
duced the conflict relative to what appeared with CDF 
alone. 

Figure [2] shows the variation in for the fit to the 
jet data (72 ±110 points) and its complement (2777 



points) along the four leading directions. The numer- 
ical results shown in Table |V] correspond to fitting 
these curves by parabolas at their minima. For the 
first two directions, the "parabola" for the jet data S 
is narrower than the "parabola" for its complement, 
as expected since 71,72 > 0.5. This confirms that 
the jet data dominate the global fit along those direc- 
tions. For Z3 and 2:4 (and all other directions, which are 
not shown), the jet data supply very little constraint: 
the "parabola" is much broader for S than for S. 
The locations of the minima are quite far apart for Z2, 
which reflects the tension between S and S along that 
direction. 

To study the consistency between the two individ- 
ual jet experiments within the context of the global fit, 
their ^ values are plotted separately in Fig. [3] along 
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FIG. 3: for fit to CDF (dotted), DO (dashed), and the 
rest of the data (sohd). 



the same eigenvector directions as in Fig. [2] There 
appears to be a bit of tension between the two experi- 
ments along these directions, since their minima occur 
at different places. Quantitatively, fitting each curve 
in Fig. [3] to a parabola near its minimum, leads to 
the results shown in Tabic IVII The discrepancy be- 
tween the jet experiments is 2.4 cr and 1.6 cr along the 
two directions in which these experiments arc signif- 
icant in the global fit. Any discrepancy between the 
jet experiments along other directions, including the 
strong difference along direction 4, is not important 
for the global fit, because non-jet experiments supply 
much stronger constraints along those directions, as is 
confirmed by the narrow parabola for S. 
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Zi from CDF 


Zi from DO 


Difference 




1 


2.70 ± f. 65 


-2.45 ± 1.38 


5.15 ± 2.15 


2.40 


2 


2.33 ± 1.35 


-1.74 ±2.22 


4.07 ±2.60 


1.57 



TABLE VI: Consistency between CDF and DO jet experi- 
ments. 

The DSD method can also be used to discover which 
aspects of a global fit arc determined by particular sub- 
sets of the data. An example of this is illustrated by 
Fig. m which shows the gluon distribution at QCD 
scale 1.3 GeV, for PDF sets corresponding to displace- 
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FIG. 4: Gluon distributions g(x) at z\ = 4.0 (long dash), 
zi = —4.0 (long dash dot), 22 = 4.0 (short dash), Z2 = —4.0 
(short dash dot), and Zi = ±4.0 for i — 3, . . . , 24 (sohd). 
Most of the uncertainty for g{x) comes from eigenvector 
directions 1 and 2, which are controlled principally by the 
jet experiments according to Fig. [2] 



ments Zi = ±4 along each eigenvector direction of the 
CDF±DO fit. Most of the uncertainty is seen to come 
from the Zi and Z2 directions, which are the directions 
found above to be controlled by the jet data. This di- 
rectly confirms the conclusion of that the jet data 
are the major source of information about the gluon 
distribution for x ^ 0.1. 



5. CONCLUSION 

A "data set diagonalization" (DSD) procedure has 
been presented, which extends the Hessian method [l[ 
for uncertainty analysis. The procedure identifies the 
directions in parameter space along which a given sub- 
set S of data provides significant constraints in a global 
fit. This allows one to test the consistency between S 
and the remainder of the data, and to discover which 
aspects of the fit are controlled by S. 

The procedure involves "rediagonalizing" to ob- 
tain a new set of fitting parameters {zi} that are lin- 
ear combinations of the original ones. The data from 
a given experiment or other chosen subset S of the 
data and its complement S take the form of indepen- 
dent measurements of these new parameters, within 
the scope of the quadratic approximation. The degree 
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of consistency between S and S can thus be examined 
by standard statistical methods. 

The DSD method can be used to study the internal 
consistency of a global fit, by applying it with S de- 
fined by each experimental data set in turn. One can 
also let S correspond to subsets of the data that are 
suspected of being subject to some particular kind of 
unquantified systematic error. A full systematic study 
of the parton distribution fit using the new technique 
is currently in progress. 

Typical applications of the new technique have been 
illustrated in the context of measuring parton distri- 
bution functions. The method uncovered and quanti- 
fied tension between the two inclusive jet experiments, 
and between one of those experiments and the non-jet 
data, that was difficult to detect using the older meth- 
ods, which are based on tracking the effect on for 
S and S in response to changing the weight assigned 
to S 0,0. 

The DSD method can be also be used to identify 
which features of the fit are controlled by particular 
experiments or other subsets of the data in a complex 
data set. As an example of this, the jet experiments 
were shown to be the principal source of information 
on the gluon distribution in the region displayed in 
Fig. m The logic is as follows: Fig. 2] shows that the 
uncertainty of the gluon distribution is dominated by 
eigenvector directions 1 and 2 when S is defined as 
the jet data; and the range of acceptable fits along 
those directions is constrained mainly by the jet data 
according to Fig. [2] or Table IVl 



that defines the best fit to the data: 

N N 



2 2 
X = Xo 



^3 

i=i j=i 



(12) 



where .t,; is the displacement a,; — a,-"' from the mini- 
mum in the original parameter space, and the Hessian 
matrix is defined by 



Hi 



1 / d'^x 



2 V dxi dx 



(13) 



J / 



(The Hessian matrix is usually defined without the 
overall factor 1/2, but the normalization used here is 
more convenient for present purposes.) Eq. ([T^ fol- 
lows from Taylor series in the neighborhood of the 
minimum. It contains no first-order terms because the 
expansion is about the minimum, and terms smaller 
than second order have been dropped according to the 
quadratic approximation. 

Since H is a symmetric matrix, it has a complete 



set of N orthonormal eigenvectors V^^\ . . . , V^^"^: 
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N 
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(fc) 



(14) 
(15) 
(16) 



Acknowledgments 

I am grateful to my late colleague and friend Wu-Ki 
Tung for the pleasure of many discussions on these is- 
sues. This research was supported by National Science 
Foimdation grant PHY-0354838. 



The eigenvalues e^. are positive because the best fit 
must be at a minimum of x^- Multiplying (jl4p by 
Vm'' and summing over k yields 



N 

fe=i 



(17) 



We can define a new set of coordinates {yi\ that de- 
scribe displacements along the eigenvector directions: 



Appendix: Rediagonalizing the Hessian matrix 

This Appendix describes details of the procedure 
that simultaneously diagonalizes the coordinate de- 
pendence of x^ and one additional quantity within the 
quadratic approximation. The procedure was first de- 
scribed in Appendix B of 0, but its significance was 
not recognized in that paper. 

The Hessian method is based on the quadratic ex- 
pansion of x^ in the neighborhood of the minimum 



Then 
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(18) 
(19) 

(20) 



(21) 
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Any additional function G of the original coordi- 
nates {oi} can also be expressed in terms of the new co- 
ordinates {j/i} and expanded by Taylor series through 
second order: 

N N N 

G = Go + Y.p-v^ + Y.Y.Q-iy^y^ ■ (22) 

2—1 i—1 j — 1 

The symmetric matrix Q, like H, has a complete set 
of orthonormal eigenvectors U^^\ . . . , uj^^^: 



it is actually necessary to compute the linear transfor- 
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k=l 
N 

fc=i 



(fc) 



from which it follows that 

N 

iij ~ 



(fc) jj(k) 
3 



(23) 
(24) 
(25) 

(26) 



fc=i 



Defining new coordinates {zi} by 



N 



(27) 



N 



N 



now leads to 

= Xo + Y.- 

JV 

G = Go + ^ 2 A + ^ 7,; , (28) 

i=l i=l 

where 

1 ^ 

Hence both ^ and G are diagonal in the new coordi- 
nates {zi} in the quadratic approximation. Eq. 
which is the basis of this paper, follows immediately 
from by choosing G to be the contribution to 
from the subset S of the data. 

Because non-quadratic behavior appears at widely 
different scales in different directions of the original pa- 
rameter space, and because the second-derivative ma- 
trices are calculated numerically by finite differences. 



mation from the old coordinates {a^ 



,(0) 



} to the new 



coordinates {zi} by a series of iterations [8|. This is 
done as follows. The procedure described above yields 
a coordinate transformation W defined by 



N 



tti — a. 



(0) 



(30) 



The coordinates {zi} can be treated as "old" coordi- 
nates and the above steps repeated to obtain a refined 
set of elements for the matrix W. This process is iter- 
ated a few times to obtain the final form of the trans- 
formation. The iterative method is simple to program: 
each iteration begins with an estimate of the desired 
transformation matrix W in (j30p and ends with an 
improved version of W. One can start with the unit 
matrix Wij = dij and iterate until the matrix W stops 
changing. This procedure has been found to converge 
in all of the applications for which it has been tried. 

The distance moved away from the minimum in the 
original coordinate space is given by 



N N N / N \ 

i=l i=l j=l \k=l J 



which corresponds to the choice 



Q 



N 
k=l 



(31) 
(32) 



in the iterative scheme. This choice produces eigenvec- 
tor directions that are characterized by how rapidly 
changes in the original parameter space, leading to a 
clear distinction between "steep directions" in which 
increases rapidly with displacement in the original 
parameters, and "flat directions" in which the in- 
creases only slowly. The degree of steepness or flatness 
is measured by the eigenvalues of Q. 

In the PDF analysis, a large number of free param- 
eters are used in order to reduce the "parametrization 
error" caused by the need to represent unknown con- 
tinuous parton distribution functions by approxima- 
tions having a finite number of parameters. In that 
application, the logarithms of the eigenvalues of Q are 
found to be roughly uniformly distributed, with the 
smallest and largest eigenvalues having a huge ratio. 
As a result, the iterative method has been found to be 
necessary even to carry out the conventional Hessian 
analysis, where only x^ needs to be diagonalized. 
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